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ABSTRACT 



Probability and statistics have become indispensable to scientific, technical, 
and management progress; they serve as essential dialects of mathematics, 
the classical language of science, and as instruments necessary for intelli- 
gent generation and analysis of information. Probability evolved from the 
investigation of gambling problems, and of problems in the analysis of infor- 
mation which contained observational errors. On the other hand, statistics 
evolved from the satisfaction of a governmental requirement for information, 
from the parallel and independent development of a framework in which to 
analyze information, and from the combination of the need for analysis cre- 
ated by the farmer with the ability to perform analysis provided by the latter. 

A prelude to probability and statistics is presented by examination of the 
important concepts that form their foundation. The brief written discussion 
of these concepts in outline form is augmented by examples and a bibliography. 

'■'This outline was prepared for use in a cooperative program of the Southern 
California Chapter of the American Statistical Association, and the Division 
of Secondary Education in the Los Angeles City School Districts. It forms 
the basis for both a series of lectures to 11th grade students in the Math- 
ematics Summer Honors Program, and a series of lectures to secondary 
mathematics teachers in the Workshop on Probability and Statistics. 
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I. INTRODUCTION 

A. Probability and statistics have become indispensable to scientific, 
technical, and management progress; they serve as essential dia- 
lects of mathematics, the classical language of science, and as 
instruments necessary for intelligent generation and analysis of 
information. 

1. Probability evolved from the investigation of gambling problems, 
and of problems in the analysis of information which contained 
observational errors; while statistics evolved from the satis- 
faction of a governmental requirement for information, from the 
parallel and independent development of a framework in which 

to analyze information, and from the combination of the need 
for analysis created by the former with the ability for analysis 
provided by the latter. 

2. A prelude to probability and statistics is presented by an exam- 
ination of the important concepts that form their foundation. 

3. The brief written discussion of these concepts in outline form is 
augmented by examples and a bibliography. 

B. An event is something that happens; it is an occurrence or an 
outcome. 

1. Two types of events occur: deterministic and random. 

2. A deterministic event occurs with certainty, and its occurrence 
can be predicted or determined in advance. 

a. A coin falls to rest when tossed. 

b. A fruit borne by an apple tree is an apple. 

c. The Earth rotates on its axis every 24 hours. 

d. The length of this table is . 

3. A random event does not occur with certainty, and its occur- 
rence cannot be predicted or determined in advance. 



a. 



Will the tossed coin fall v/ith heads up or tails up? 
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b. How many apples will be borne by an apple tree this year; 
and what will be the size, weight, color, flavor, and tex- 
ture of a selected apple? 

c. Will it rain tomorrow, and if so, how much rain will fall? 

d. What will be the length of this table, as measured by a 
selected student? 

C. Probability and statistics deal with random events and the random 
mechanisms which produce them by characterizing and utilizing the 
regularity of randomness. 

1. Probability assumes knowledge concerning a random mechanism, 
and deduces statements concerning the random events which it 
produces. 

2. Statistics assumes knowledge concerning random events, and 
infers statements concerning the random mechanism which 
produces them. 

3. Based upon the above definitions, probability and statistics are 
inverse operations of each other. 

4. Although Mendel's Law regarding plant genetics is probabilistic, 
it was inferred statistically from his experiments with sweet 
peas. 

5. Probability and statistics, aided by computer science, are 
major elements of the modern technology or modus operandi 

of the scientific method (hypothesis to experiment to analysis to 
inference to hypothesis, and so forth); while the scientific disci- 
plines are the raw material of the scientific method. 

6. Currently, the secret of success in applying the scientific 
method is a dialogue between a scientist and a statistician. 




Certain mathematical concepts are needed as prerequisites to the 
discussion of probability and statistics; they are: 

1. A set, A, is a collection of elements , x, that are said to be 
contained in the set A. 

a. A = {Xj, x^, x^}, and x^ sA, x^ eA, and x^ eA. 

b. A = {x: mathematical statement concerning x} and x eA. 

2. If S and T are sets, then a function , f, from the domain space , 

S, into the range space , T, is a relationship that associates 
one and cnly one element, t, contained in set, T, with each 
element, s, contained in set, S. 

a. t = f(t»). 

b. A function, f, is a function of a real variable if the elements 
of set, S, are real numbers. 

c. A function, f, is real- valued if the elements of set, T, are 
real numbers. 

d. A function, f, is a set func tion if the elements of set, S, are 
themselves sets. 

3. A graph for the real- valued function, f, of the real variable, x, 

is a pictorial representation of the set, A = {(x, y): y = f(x)} ? 

on a two-dimensional coordinate system. 
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The Graph of f(x) 



The nurnb,er of elements in a set is countably infinite if the 
elements are in one-to-one correspondence with the set of 
positive integers, and it is uncountably infinite if the elements 
are in one-to-one correspondence with the set of real numbers. 

If the value of the real-valued function, f, becomes arbitrarily 
close to b as the value of the real variable, x, becomes suf- 
ficiently close to a, then the limit of f as x approaches a is 
said to be b. 



a. lim f(x) = b. 

x — *>a 

b. lim f = b (x. is n and a is oo). 
n — ► co 



If f i» f 2 » • • • » • • • constitute the range of a real- valued 

function, f, whose domain is the set of ordered positive integers 
(a sequence of real numbers), then the summation of the num- 
bers, f., from the value of i being n to the value of i being N 

is the sum of the numbers, f , f , . . . , f^ T . 

n nr 1 jn 



a. 




f + 
n 



n + 1 



+ 




b. 2 f. - f, + f ? + • • • + f. + • • • (n is 1 and N is «). 

i = 1 1 1 L 1 

If f is a real- valued function of the real variable, x, then the 
integral of the function, f, from the value of x being a to the 
value of x being b is the area between the graph of the positive 



values of the function, f, and the x-axis; minus the area between 
the graph of the negative values of the function, f, and the 
x-axis. 



and related concepts for real numbers; topology is the branch 
of mathematics concerned with set, function, graph, limit, 
and related concepts in general; and measure theory is the 
branch of mathematics concerned with summation, integration, 
and related concepts in general. 
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A 



y = f(x) 





f(x)dx = 




The Integral af f(x) 



Calculus is the branch of mathematics concerned with these 
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ABSTRACT 



Probability and statistics have become indispensable to scientific, technical, 
and management progress; they serve as essential dialects of mathematics, 
the classical language of science, and as instruments necessary for intelli- 
gent generation and analysis of information. Probability evolved from the 
investigation of gambling problems, and of prroblems in the analysis of infor- 
mation which contained observational errors. On the other hand, statistics 
evolved from the satisfaction of a governmental requirement for information, 
from the parallel and independent development of a framework in which to 
analyze information, and from the combination of the need for analysis cre- 
ated by the farmer with the ability to perform analysis provided by the latter. 

A prelude to probability and statistics is presented by examination of the 
important concepts that form their foundation. The brief written discussion 
of these concepts in outline form is augmented by examples and a bibliography. 

-''This outline was prepared for use in a cooperative program of the Southern 
California Chapter of the American Statistical Association, and the Division 
of Secondary Education in the Los Angeles City School Districts. It forms 
the basis for both a series of lectures to 11th grade students in the Math- 
ematics Summer Honors Program, and a series of lectures to secondary 
mathematics teachers in the Workshop on Probability and Statistics. 
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INTRODUCTION 

A. Probability and statistics have become indispensable to scientific, 
technical, and management progress; they serve as essential dia- 
lects of mathematics, the classical language of science, and as 
instruments necessary for intelligent generation and analysis of 
information. 

1. Probability evolved from the investigation of gambling problems, 
and of problems in the analysis of information which contained 
observational errors; while statistics evolved from the satis- 
faction of a governmental requirement for information, from the 
parallel and independent development of a framework in which 

to analyze information, and from the combination of the need 
for analysis created by the former with the ability for analysis 
provided by the latter. 

2. A prelude to probability and statistics is presented by an exam- 
ination of the important concepts that form their foundation. 

3. The brief written discussion of these concepts in outline form is 
augmented by examples and a bibliography. 

B. An event is something that happens; it is an occurrence or an 
outcome. 

1. Two types of events occur: deterministic and random. 

2. A deterministic event occurs with certainty, and its occurrence 
can be predicted or determined in advance. 

a. A coin falls to rest when tossed. 

b. A fruit borne by an apple tree is an apple. 

c. The Earth rotates on its axis every 24 hours. 

d. The length of this table is . 

3. A random event does not occur with certainty, and its occur- 
rence cannot be predicted or determined in advance. 

a. Will the tossed coin fall v/ith heads up or tails up? 



b. How many apples will be borne by an apple tree this year; 
and what will be the size, weight, color, flavor, and tex- 
ture of a selected apple? 

c. Will it rain tomorrow, and if so, how much rain will fall? 

d. What will be the length of this table, as measured by a 
selected student? 

C. Probability and statistics deal with random events and the random 
mechanisms which produce them by characterizing and utilizing the 
regularity of randomness. 

1. Probability assumes knowledge concerning a random mechanism, 
and deduces statements concerning the random events which it 
produces. 

2. Statistics assumes knowledge concerning random events, and 
infers statements concerning the random mechanism which 
produces them. 

3. Based upon the above definitions, probability and statistics are 
inverse operations of each other. 

4. Although Mendel's Law regarding plant genetics is probabilistic, 
it was inferred statistically from his experiments with sweet 
peas. 

5. Probability and statistics, aided by computer science, are 
maijor elements of the modern technology or modus operandi 

of the scientific method (hypothesis to experiment to analysis to 
inference to hypothesis, and so forth); while the scientific disci- 
plines are the raw material of the scientific method. 

6. Currently, the secret of success in applying the scientific 
method is a dialogue between a scientist and a statistician. 



D. Certain mathematical concepts are needed as prerequisites to the 
discussion of probability and statistics; they are: 

1. A set, A, is a collection of elements , x, that are said to be 
contained in the set A. 

a. A = {xj, x^, x^}, and x^ sA, x^ eA, and x^ eA. 

b. A = {x: mathematical statement concerning x} and x eA. 

2. If S and T are sets, then a function , f, from the domain space , 

S, into the range space , T, is a relationship that associates 
one and cnly one element, t, contained in set, T, with each 
element, s, contained in set, S. 

a. t = f(s). 

b. A function, f, is a function of a real variable if the elements 
of set, S, are real numbers. 

c. A function, f, is real- valued if the elements of set, T, are 
real numbers. 

d. A function, f, is a set function if the elements of set, S, are 
themselves sets. 

3. A graph for the real-valued function, f, of the real variable, x, 

is a pictorial representation of the set, A = {(x, y): y = f(x)}, 

on a two-dimensional coordinate system. 
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The Graph of f(x) 



The nuxnb,er of elements in a set is countably infinite if the 
elements are in one-to-one correspondence with the set of 
positive integers, and it is uncountably infinite if the elements 
are in one-to-one correspondence with the set of real numbers. 

If the value of the real- valued function, f, becomes arbitrarily 
close to b as the value of the real variable, x, becomes suf- 
ficiently close to a, then the limit of f as x approaches a is 
said to be b. 



a. lim f(x) = b. 
x ► a 

b. lim f = b (x is n and a is oo). 

n 

n— ► co 



If f 1» ^ 2 * " • • » f • » • • • constitute the range of a real- valued 
function, f, whose domain is the set of ordered positive integers 



(a sequence of real numbers), then the summation of the num- 
bers, f., from the value of i being n to the value of i being N 
is the sum of the numbers, f n , f^^, ...» fj-. 



a. 




f. 

1 




+ f 



n + 1 



+ 




b. 2 f. - f, + f 7 + • • • + f. + • • • (n is 1 and N is oo). 

i = 1 1 I L l 

If f is a real- valued function of the real variable, x, then the 

integral of the function, f, from the value of x being a to the 

value of x being b is the area between the graph of the positive 



values of the function, f, and the x-axis; minus the area between 
the graph of the negative values of the function, f, and the 
x-axis. 



y 




The Integral af f(>:) 

8., Calculus is the branch of mathematics concerned with these 
and related concepts for real numbers; topology is the branch 
of mathematics concerned with set, function, graph, limit, 
and related concepts in general; and measure theory is the 
branch of mathematics concerned with summation, integration, 
and related concepts in general. 






II. PROBABILITY 

A. Probability characterizes the uncertainty associsited with random 
events by expressing the regularity of randomness in mathematical 
terms or numerical form. 

1. Although the occurrence of a random event cannot be predicted 
and repeated trials of a random mechanism do not yield identical 
results regarding a random event, a large collection of such 
results does possess characteristics which are predictable in 
the long run. 

2. This long-run predictability for characteristics associated 
with a random event or a random mechanism is what is meant 
by the regularity of randomness. 

3. The uncertainty associated with a random event is characterized 
by expressing the likelihood or probability of occurrence of the 
event as a number between zero and one inclusive (0 ^ probability 
£ 1 ). 

4. Parzen* characterises probability theory as the study of random 
phenomena and mathematical models of random phenomena. 

5. As examples, consider the random events listed in I. B. 3 
above. 

B. Because probability is a mathematically primitive notion and is 

difficult (if not impossible) to define precisely and rigorously, the 
concept of probability will be characterized rather than defined. 

1. Events are mutually exclusive if the occurrence of any one of 
them precludes or prevents the occurrence of all the others. 



*E. Parzen. Modern Probability Theory and its Applications. John Wiley 
and Sons, Inc., I960, Pages 1 and 5. 






Events are equally likely, if each is as apt to occur as any 
other. 

Classical characterization: If an event can occur in exactly N 
mutually exclusive and equally likely ways and M of these ways 
have an attribute, A, then the probability, P(A), of A should be 
M/N. 

A preferable characterization of probability employs the notion 
of a random mechanism, called a random experiment. 

a. If the outcome of an experiment, t, with possible outcomes, 
E a , is not predictable, then £ is called a random 
experimen t; E^, is called a simple event ; any "meaningful" 
collection, E, of simple events is called an event ; and the 
collection, S, of all simple events is called the sample 
space . 

b. Empirical characterization: If 6 is performed n times and 

the event, E, occurs m of these times, then the empirical 

probability, (E), of E should be m / n; and the probability, 

P (E), should be characterized as the limit, in some sense 

that exists, of P (E) as n becomes infinite. 

n 

c. Axiomatic characterization: Let there be associated with 

each event, E, a number, P(E), called the pr obability of 

E and having the following characteristics (wt.ich are the 

characteristics of P (E) for a fixed n): 

n 

(i) P(E) a 0. 

(ii) P(S) - 1. 

(iii) P(E) = P(E^) + PIE^), where E^ and E^ are two 
events which contain no common simple events 
(are mutually exclusive) and constitute E when 
combined. 






d. A set of operating rules for probability, called the calculus 
of probability, may be derived by the application of mathe- 
matical logic to the axiomatic characterization of 
probability. 

e. In this calculus of probability, the law of large numbers 
states that P n (E) essentially approaches P(E) as n becomes 
infinite (specifically: 

lim P {| P n (E) - P(E)| £ e} = 0 for e > 0, the proof 
n co 

of which is given under II. G. 13). 

f. Hence, the empirical and axiomatic characterizations of 
probability are connected (the former being concerned with 
the calculation of probability; and the latter being concerned 
with the characterization of, and consequent operating rules 
for, probability) by making the requirement that P(E) be 

a number tl ~ t satisfies the limit equation. 

g. The ultimate measure of the "goodness" or "reasonableness" 
of the random- mechanism characterization of probability 
(the empirical and axiomatic characterizations of probability, 
and their connection) is the extent to which the resulting 
calculus of probability is applicable to real problems. 

5. Probability is a function whose domain consists of events, and 
whose range is the interval, 0 = P = 1 (a real- valued set 
function). 

C. Given a characterization of the concept of probability, some defini- 
tions and notation that underlie the calculus of probability are 
introduced. 

1. A diagram that depicts events and relationships among events 
in the sample space is called a Venn diagram. 
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2. S denotes the sample space, and A., A OJ . . ., A denote events. 

I L n 

3. A,, A~, . . A are exhaustive if at least one of them is 

l c* n 

certain to occur. 

4. The empty or null event , $, is the event which is composed of 
no simple events. 

5. The event which is composed of all simple events that are not 
contained in A^ is called the complement of A^, and denoted by 

A c 

A V 

6. The event which is composed of all simple events that are simul- 
taneously contained in A., A 0 , . „. A is denoted by A. fl A, D 

I c n y i l 

• •• 0 A , and is called the intersection of A,, A-, .... A . 

n * 12 n 

7. AtU A 0 U * * * U A_ denotes the union of A,, A 0 , . . . , A , which 

i c n l c, n 

is the event composed of all simple events that are contained in 

at least one of A, , A 0 , ..., A . 

i c n 

8. If all simple events which are contained in A^ also are con- 
tained in A^, then A^ is said to be contained in and this is 
denoted by A ^ £ A^. 

9. Aj | A^ (read A^, given A^ has occurred) means to consider 
that part of A^ which is contained in as an event in the 
restricted sample space, A^. 

10. A^, A^, ...» A n are independent if the occurrence of any one of 

them has no effect on the probability of occurrence of any other 
one. 

D. Consequences of these definitions and notation regarding events 
(which are, of course, sets) are easily derived. 

1. The group of events, A^, A^, . . ., A^, is mutually exclusive 

if and only if no two of them intersect- -that is 

\ 

A. fl A. = $ for i = 1, 2, . . . , n, and j = i + 1, i + 2, 

^ J 

9 9 9 ) n. 
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2. Some relationships among events are: 



a. 


(A^) c - A r 




b. 


Aj H Aj = $ and A^ U Aj = S. 




c. 


A^ fl $ = $ and A^ U <E> = A^. 




d. 


A^ fl A 2 = A 2 H A and A U A 


= A 


e. 


a n (a 2 n a 3 ) = (A x n a 2 ) n a 3 


and 



A X U (A 2 U A 3 ) = (A x U A 2 ) U A 3 . 

f. A n (A 2 u A 3 ) = (A n A 2 ) U (A x n A 3 ) and 

a x u (a 2 n a 3 ) = (a u a 2 ) n (a x u a ). 

g. (Aj n A 2 ) c = A J u A 2 and (Aj U A 2 ) c = A® 0 A^. 

h. A^ 0 A 2 is that part of A^ which is also contained in A 2 , 

Q 

and is contained in both A^ and A 9 ; and A^ fl A 2 is that part 
of Aj which is not contained in A 2> and is contained in A^. 

c c 

i. A^ fl A 2 , A j fl A 2> and A^ D A 2 are mutually exclusive and 

a x u a 2 = (A x n a 2 ) u (Aj n a 2 ) u (a n a ). 

j. Aj n a 2 n a 3 , a* n a 2 n a 3> a^ n a 2 n a 3 , A 1 n a 2 n a* 

Ai PI A 2 fl A 3 , Aj n A 2 n A 3 , and A^ fl A 2 fl A 3 are 

mutually exclusive; and A, U A_ U A_, = (A, D A C ~ H a!?) 

u (A® n a 2 n a®) u (a° OA^n a 3 ) u <a n a 2 n a 3 ) 

u(Aj n a 2 n a 3 ) u (a® n a 2 n a 3 ) u (a n a 2 n a ' 

k. As an example, consider S = {l, 2, 3, 4, 5, 6}, A^ = {l, 2}, 
A 2 = {2, 4, 6}, and A 3 = {5}. 

3. A,, A OJ . . . , A are exhaustive if A., U A 0 U • • • U A ) = S, 

1 2 n 12 n 

and if and only if P(A^ U A 2 (J * * * U A^) = 1. 
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E. The following elementary results serve as examples of the calculus 
of probability: 

1. P($) = 0 and P(S) = 1. 

2. If Aj c A^, then P (A^ ^ P(A 2 ). 

3. P(Aj) = 1 - P (Aj) and P(Aj 0 A^) = PfAj) - P^D A^. 

4. P(A, U A_ U •• • U A ) 5 2 P(A.), with equality holding if 

i w n , f 1 

1=1 

and only if A^, A^, . . . , A^ are mutually exclusive. 

5. P(A 1 U A ) = P(A 1 ) + P(A 2 ) - P(A 1 0 A 2 ). 

6. P(A 1 U A 2 U A 3 ) = P(Aj) + P(A 2 ) + P(A 3 ) - P(Aj fl 

- p(A 1 n a 3 ) - p(a n a 3 )+ p(a 1 n A 2 n a ). 

7. If P (A- ) > 0, then 

P(A x |A 2 ) = P(Aj n A 2 )/P(A 2 ) or 
P (Aj n A 2 ) = P (A J I A 2 ) P(A 2 ). 

8. Conditional probabilities satisfy the same relationships as 
(unconditional) probabilities, so long as all conditional 
probabilities are defined. 

9. A^ and A 2 are independent if and only if P (A^ 0 A 2 ) 

= P(Aj) P(A 2 ); whereas, P(Aj|A 2 ) = P(Aj) (or P(A 2 | Aj) 

= P(A 2 )) if A^ and A 2 are independent and P(A 2 )>0 (or P(A^) > 0). 

10. Mutual exclusiveness or lack of it is a property of events which 
is solely determined by the properties of events as sets ; 
independence or lack of it is a property of events which is 
solely determined by the properties of the* probability function 
defined over them. 
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Bayes Theorem : If A^, . . A^ are mutually exclusive 
and A^ can occur only if one of A^, A^, . . . , ^ n _i occurs 

(A C A. U A 0 U • • • U A ), then 
' n = 1 2 n- 1 

P(A n ) = n £ 1 P( A n n A.) = n ^ 1 P(A n | A i ) P(A i ) and 
i = 1 i = 1 

P(A.|A ) = P (A |A.)P(A)/ S P(A |A ) P(A ) for 
Jl n 7 nj j J i= i n i 1 1 

j = 1, 2, . . . , n- 1 . 

It is informative to reconsider the above example, with P (x) 

= 1/6 for xeS. 



F. The following comments concerning the calculation of probabilities 
are a complement to the above results concerning the manipulation 
of probabilities: 

1. If S = {E. , E , E }, allE.'s are equally likely (P (E ) = 1/n 

lb n l • ^ 

for i = 1, 2, . . . , n), and A = {E r E^ . . . , E m >, then the size 
of A (number of E_. ! s which constitute A) is m and P(A) = m/n. 

2. When previously selected objects are (are not) replaced before 
the next selection, the sampling procedure is called sampling 
with (without) replacement . 

3. If an event, A^, can occur in ways, for each of these ways a 

different event, A^, can occur in ways, . . . , and for each 
of these ways a different event, A , can occur in ways, 
then the m events, A ^ . . . , A^, can occur in a combina- 

tion in N. N 0 • • • N ways. 

12 m 

4. The number of permutations , (n)^ n objects selected k at 
a time is the number of arranged (with regard to order of 
selection) ways that k objects can be selected from a group 
of n objects. 






5. 



6 . 



7. 



The number of combinations , (^j, of n objects selected k at a 
time is the number of unarranged (with regard to order of 
selection) ways that k objects can be selected from a group 
of n objects. 

If n is a positive integer, then the product of n, (n-1), . . . , 2, 1, 
is called n factorial and written!. For convenience, the 
symbol, 0!, is defined to be 1. 

Then (n)^ = n(n-l) ,,# (n-k+1) = , and = 



= / n \ 

:!(n.-k)l \n-k/ 



8. Select the most accurate and complete sample space for the 
problem before attempting to compute probabilities, and be 
careful; simplications and shortcuts are justified only ii 
they produce the same results. 

9. Two illustrative examples are provided by the following 
problems: 

a. Calculate the probability of obtaining two heads and one 
tail in three tosses of a fair coin, with and without 
employing combinations. 

b. Calculate the probability of rolling a five when two fair 
dice are rolled. 



G. Finally, additional probabilistic concepts are useful prerequisites 
for the discussion of statistics. 

1. A random variable, X, is a real- valued function defined over 
the sample space, S, of a random experiment, £. 

2. The p robability associated with a value of the random variable, 
X, is the probability associated with the corresponding out- 
come(s) of the random experiment. 
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A discrete random variable , X, assumes only a countable num- 
ber of isolated values, x^ x^, . . . ; a continuous random var i- 
able assumes every value in some interval. 

The probability distribution of a random variable, X, is a 
description of the distribution of probability over 
the values associated with the random variable, X. 

The cumulative distribution function, F, of a random variable, 

X, represents the probability that, the random variable, X, does 
not exceed a given value, x: 

F(x) = P(X ^ x). 

The probabilit y density function, f, of a random variable, X, 
represents the probability that the random variable, X, assumes 
a given value, x: 

a. For a discrete random variable, X, f (x) = P(X = x). 

b, For a continuous random variable, X, f (x) dx ~ P (x < X 
= x + dx). 

If h is a real-valued function of a real variable and X is a 
random variable, then the expected value, E [h (X)] , of the 
random variable, h (X), is the weighted average of the function, 
h, with respect to the probability distribution of the random 
variable, X. 

a. For a discrete random variable, X, 



E[h(X)] = s 



h (x.) f (x.). 
1 1 



i= 1 



b. For a continuous random variable, X, 



E [h (X)] 




h (x) f (x) dx. 
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8. The expected value, E(X), of a random variable, X, is called 
its mean, p; and p may be thought of as the "center of prob- 
ability mass. " 

2 

9. The expected value, E [(X “ p) 3> oi the squared deviation of 
a random variable, X, from its mean, p, is called its 
variance, <r ; andu may be thought of as the "moment of prob- 
ability inertia. " 

10. The binomial distribution with parameters, n = 1, 2, ... and 
0 ^ p t 1, has 

fw = C)p x u-P) n - x 

for x = 0, 1, .. . . , n; and it is an example of a probability 

distribution for a discrete random variable. 



11. 



The normal distribution with parameters, -co< p < co and <r > 0, 
has 



f (x) = ,/f 



1 -(x - M-) 2 /2(T 2 



TTO* 



for -as < x < co; 



and it is an example of a probability distribution for a con- 
tinuous random variable. 

12. Chebyshev’s inequality: If X is a discrete random variable 

2 

with mean, p, and variance, <r , then 

P(|X - p|* t) £ <r 2 /t 2 for t > 0; 

and the proof of this follows: 

a. By definition, 

00 

(T 2 = ( x ^ _ jj.) 2 f ( x .). 
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b. Grouping x^, x^, • • • into two sets, - { x ^ : | x ^ ~ Pj < t} 

and A 2 = {x.:| x. - |x| > t}, produces 




2 

c. Because (x. - p) =0 and f(x.) = 0, 

i i 

E (x. - p) 2 f (x.) 1? 0 and 

x.eA. 1 1 

1 1 

cr^ ^ 2 ( x - - p) 2 f(x.). 

x.eA, 1 

i 2 



d. By definition, 



E 

X i eA 2 



(x. - p)‘ 



f (x.) ^ 



£ a 

x. e A n 

i 2 



f (x^ 



and 



E f(x.) « P(|x - 

x.e A~ 1 
i 2 

e. Then 

o' 2 ^t 2 P(|X - p| > t). 

2 

f. Dividing both sides of this inequality by t yields 

P(|X - p| £ t) ^ cr 2 /t 2 for t > 0. 

13. Law of large numbers : If X^, X^, . . . , X^ are independent, 

identically distributed, and discrete random variables with mean, 
p, and variance, o’ 2 , and 





n 




X = 1/n 


E 


X., then 


n 


i= 1 


i 


lim P(|X 
1 n 


“ H 


£ e) = 0 for e > 0; 
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and the proof of this is given by: 

a. If the discrete random variable, X, has probability density 
function, f, and a is a constant, then E (a) = a and E (aX) 

= aE (X); and its proof follows. 

(i) By definition, E (a) = a(l) = a. 

(ii) Also by definition, 

E (aX) = 2 ax.f(x.) = a £ x.f(x.) = aE(X). 

i= 1 11 i= 1 1 1 

b. It can be shown that E (X + Y) = E(X) + E(Y) for X and Y 
being discrete random variables, and a and b being constants. 

c. The combination of the above two p7.*operties of expectation 
implies that 

n n 

E(£ a. X. ) = D a.E(X.) 
i= 1 ii i= 1 1 i 

for Xj, X^> ...» X^ being discrete random variables and 
a,, a„, . . . , a being constants. 

d. Also it can be demonstrated that E[(X - p) (Y - vfi =0 for 
X and Y being independent and discrete random variables 
with means, (jl and v. 

e. Using these properties of expectation produces 

(i) E(X ) = E(l/n 2 X i> = J / n E E(X.) = 1/n (np) = p. 

i= 1 i= 1 

E[(X - jjl)] 2 = [E (1/n £ X - n) 2 ] 

i= 1 

n n 

= E {U/n (E X - n(i)] 2 } = E [l/n 2 (£ X. -n^) 2 ] 
l " 1 i= 1 1 



(ii) 



mnevn» 
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= l/n 2 E [(E X. -n|x) 2 ]= 1/n 2 E {[ E (X - |x)] 2 } 
i=l 1 i=l 



= 1/n 2 E [ £ (X. - (x ) 2 + 2 £ E ( x i - < X i - H')] 

hi 1 i=l j=i+l J 



= 1/n 2 E [E (X- -1^) 2 ]+ E[2 E E (X. - n) (X. - p.)] 
i=l 1 i=l j=i+l J 



= 1/n 2 E E[(X. - m.) 2 ] + 2 £ £ E [(X. - (x) (X. - |r)] 

i=l 1 i=l j=i+l J 



= 1/n 2 (ncr 2 ) + 2 (0) = ir 2 /n. 

f. Since X., X, .... X are discrete random variables, X r 
1 u n 

is also a discrete random variable. 

p Hence Y is a discrete random variable with mean, n, 
o* ’ n 2 

and variance, cr / n. 

h. By Chebyshev's inequality with t replaced by e. 

P(| X - p| * e) ^ o- 2 / ne 2 for e > 0. 

i. Because cr 2 /ne 2 approaches zero as n approaches infinity, 
lim P (| X n ~ p| £ e) = 0 for e > 0. 

n -► oo 

14. The form of the law of large numbers employed in II. B. 4. e 
is a special case of the preceding result, in which: 

a. Each X. is 0 when the event, E, does not occur, and is 1 
when the event, E, does occur. 




b. Then 



P(X = 0) 
P(X. = 1) 



1 - P(E) and 
P (E) for i = l, 2, . . . 



c. In addition, X 

n 



P (E) and u. 
n ' 



P(E). 
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III. STATISTICS 



A. Statistics is concerned mainly with the proper generation or 

collection of, description of, and inference based upon random 
data which represent outcomes of random experiments. 

1. Descriptive statistics involves the description and 
summarization of random data. 

2. Inferential statistics involves the drawing of inferences in the 
presence of uncertainty resulting from randomness or incomplete 
information; these inferences are based upon data, and serve as 

a basis for rational and objective decisions. 

3. Experimental design involves the efficient performance of 
meaningful random experiments for either description or 
inference. 

4. These three phases of statistics typically follow one another 
in an iterative cycle. 

a. With the help of a scientist, the statistician first 
describes the data which he is investigating, then 
draws an inference based upon the data, and finally 

DESCRIPTIVE 




The Three Phases of Statistics 



designs additional experiments to investigate certain 
aspects of the random mechanism more thoroughly. 

b. After these experiments are conducted, the second 
cycle begins with an analysis of the data from these 
experiments, and so forth. 

B. Areas in which Statistics is Applied. 

1. Four general areas in which statistics is applied are 
scientific research, government, management, and 
daily life. 

2. Four stages occur in the evolution of a discipline from 

an art into a science: description, modeling, predic- 

tion, and control and optimization- - Adolphe Ouetelet, who 
may be viewed as the first modern statistician, said that 
"we can judge of the perfection to which a science has come 
by the facility more or less great, with which it may be 
approached by calculation. 

a. The first stage in the development of a discipline 
is the collection, structuring, and description of 
information by scientists. 

(i) Because a discipline cannot even begin without 
such information or observations, technological 
advances are often required first (as for 
astronomy, which had progressed very little for 
thousands of years, until the invention of the 
telescope) . 

(ii) Examples of disciplines which currently are mainly 
in the descriptive stage are most social sciences 

'■'H. M. Walker. Studies in the History of Statistical Method. The Williams 
and Wilkins Company, 1929, Page 39. 



(except economics and psychology), oceanography, 
and information science. 

(iii) Scientists in these fields use mainly descriptive 

statistics, but also draw some inferences and use 
some principles of experimental design. 

b. Given a sufficient amount of structuring and description, 
scientists then construct and estimate models of relation- 
ships among component parts of the structured informa- 
tion. 

(i) Models may be physical, electromechanical 
analog, mathematical, or digital computer. 

(ii) Models are required generally for any type of 
analysis, and are required particularly for 
prediction, or control and optimization. 

(iii) Examples of disciplines which currently are mainly 
in the modeling stage are most biological sciences 
(except agricultural science and genetics), medical 
science, management science, and psychology. 

(iv) Scientists in these fields use mainly modeling 
techniques and inferential statistics. 

c. Through the use of models (predominanatly mathematical, 
but increasingly electromechanical analog and digital 
computer), scientists are able to predict and simulate 
experiments. 

(i) Because the accuracy and precision of such 
predictions and simulations depend upon the 
accuracy and precision of the models, a discipline 
may approach this stage of development when 
crude models exist, but only enters it when refined 
and representative models are constructed. 



(ii) Examples of disciplines which currently are mainly 
in the prediction stage are economics, astronomy, 
engineering, meteorology, and genetics. 

(iii) Scientists in these fields use mainly inferential 
statistics. 

d. The final stage in the development of a discipline is 
reached when it is able to provide both the methods 
and the technology for control, improvement, or 
optimization. 

. (i) Examples of disciplines which currently are at 

least partially in the control and optimization 
stage are the physical and agricultural sciences 
(except in backward countries). 

(ii) Scientists in these fields use mainly mathematical, 
electromechanical analog, and digital computer 
models, as well as inferential statistics. 

3. Long a user of economic, labor, medical, and population 
statistics, government is becoming an increasing user of 
expanding types of statistical data, philosophy, and 
techniques. 

4. Statistical philosophy and techniques are increasingly used 
as an aid to important management decisions. 

5. Each of us is required to make many decisions every day 
(for example, should I take an umbrella this morning?) ; 
although most do not require formal analysis (in fact, 
many are subjective rather than objective), an understanding 
of statistical philosophy and techniques can be quite useful 
in many ways: interpretation of advertising, playing games 
of chance, comprehending scientific reports in the press, 
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understanding weather and other predictions, following the 
stock market and its analysis, and so forth. 

C. Descriptive Statistics 

1. Descriptive statistics concerns the collection and description 
of both numerical data (such as heights of boys 14 years old, 
or the number of voters preferring a specified candidate) 
and non-numerical data (such as whether a given brand of 
toothpaste tastes good, fair, or poor; or tastes better than, 
or not as good as, a competing brand). 

a. Although this collection of data is generally the layman's 
concept of statistics, statisticians are rarely involved 
in the collection of data; but they are involved in its 
analysis and interpretation, or in experimental design. 

b. To the statistician and the experimenter, the major 
use of descriptive statistics is to provide a basis for 
inferential statistics. 

2. Concepts and Techniques 

a. A population is the collection of possible values for a 
random variable, X, which may be very large or even 

infinite (in this sense, the term refers to a conceptual, 

« 

rather than an existing collection). 

b. A parameter, 0, is a measurable characteristic of the 
random variable, X, which frequently appears explicitly 

in the mathematical formula for its probability distribution. 

c. A sample is a collection of observed values, Xp X 2 , ...» 
X n , for the random variable, X. 

d. A statistic is a measurable characteristic of the sample, 

Xp X2> . . . , Xjj, 

e. Thus, sample is to population as statistic is to parameter. 




Probability assumes knowledge concerning the 
probability distribution of a random variable, X, and 
deduces statements concerning a sample, Xj, X 2 , . . . , 
X n ; whereas, statistics assumes knowledge concerning 
a sample, Xj, X 2 , X n , and infers statements 

concerning the probability distribution of the random 
variable, X. 

A histogram is a graph describing the distribution of 
probability over the sample, Xp X 2 , ...» X n . 

Measures of location are used to describe the location 
of the center of a sample probability distribution: if 

Xy X^, ...» X n is a sample, then the sample mean, 

X = ( E x.)/n, 

i = 1 

is their average; the sample median is a value, X, such 
that one-half of the observations are greater than or 
equal to X, and one -half of them are less than or equal 
to X; and the sample mode is the most frequently occur- 
ring value in the sample (that is, the highest point of 
the histogram). 

Measures of dispersion are used to describe the extent 
to which the sampJe probability distribution is 
dispersed over its range of values: if X^, X^, 

X^ is a sample, then the sample variance, 
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especially for very large populations which change 
with time (when collection and analysis of such 
vast amounts of data may be so time-consuming 
that the population will have changed by the time 
the results are available). 

Although inferential statistics has traditionally been 
viewed as being composed of theoretical statistics and 
applied statistics, a type of statistics, called develop- 
mental statistics (as in the research, development and 
production cycle), has been emerging between them. 

(*■) Theoretical statistics deals with research, mainly 
mathematical, on statistical methodology; and it 
concerns both investigation of the properties of 
existing methodologies, and derivation of new 
methodologies for the construction of efficient 
experimental designs, as well as for the efficient 
extraction and use of information contained in the 
data. 

(ii) Developmental statistics deals with development of 
statistical techniques from existing methodologies 
for application to the design of experiments for, and 
the analysis and interpretation of data from, scienti- 
fic or other investigations; this generally requires a 
familiarity with both theoretical and applied statistics. 

(iff) Applied statistics deals with application of the 
best available statistical techniques to design 
experiments for, and analyze and interpret data 
from, scientific or other investigations; this 
is generally accomplished by the applied statis- 
tician working directly with the experimenter. 
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is essentially the average of (X^ - X)^, (X^ - X) 4 *, 

. . (X n - X) 2 ; while the sample range is the difference 
between the largest and the smallest values in the 
sample. 

Inferential Statistics 

l ' Inferential statistics involves the drawing of inferences in the 

presence of uncertainty resulting from randomness or incomplete 
information; these inferences are based upon data, and serve as 
a basis for rational and objective decisions. 

a. In a sense, probability and statistics, both of which 
are concerned with the concept of randomness, look in 
opposite directions from this central point: probability 
seeks to construct mathematical models of random- 
ness, while statistics seeks to accomplish objective 
decision-making in the presence of randomness --much 
of which depends upon probability. 

b. Among the reasons for not obtaining complete informa- 
tion in a given problem are: 

(i) Complete information may be too expensive, 
expecially if the population is very large or 
resources are limited (as they usually are). 

(ii) Complete information may be impossible to 
obtain, especially for an infinite population 
(all possible corn plants cannot be grown). 

(iii) Complete information may be useless, especially 
if the population is used up (as in the testing of 
bullets). 

(iv) Complete information may be less accurate 
(although more precise) than partial information, 












It is frequently both convenient and informative to view 
inferential statistics as a game against nature, in which 
the objective is to infer the true state of nature (proba- 
bility distribution defined over the population) from the 
experimental data, so that rational and objective deci- 
sions may be made in the presence of uncertainty. 

a. The moves in this game are: 

(i) Nature selects a state of nature (population 
probability distribution) from the family of 
possible states of nature. 

(ii) The experimenter (with the statistician) 

precisely defines the population under study, 
including his population model for the family 
of possible states of nature, and formulates 
the objectives of the investigation (including 
theories, criteria and constraints). 

(iii) The statistician (with the experimenter) 
designs the experiment to achieve the 
objectives of the investigation. 

(iv) The experimenter performs the experiment 
and collects the data. 

(v) The statistician describes the data. 

(vi) The statistician infers the true state of 

nature, based upon the description of the 
data and the population model. 

(vii) The experimenter (with the statistician) makes 
a decision, based upon the inferred state of 
nature, and he sustains a loss (where a win is 
considered a negative loss), which depends 
upon the decision and the true state of nature. 




b. 



(viii) In actuality, the second move frequently is post- 
poned until after the fourth one, and the third 
move frequently is omitted; however, this 
unfortunate fact has often forced the development 
of new methodologies and techniques. 

Estimation is a statistical inference that produces 
statements about the population model, usually in 
terms of its parameter(s). 



(i) 



(ii) 



(iii) 



An estimator, 9(X^, X£, ...» X n ), of the param- 
eter, 0, is a recipe (or function) for combining 
the observed values, X^, X 2 , ...» X n , in the 
sample to estimate the parameter. 

An estimate , 0(Xj, x^, ...» x^, is a particular 
outcome of the recipe (or value of the function), 
SfXj, X 2 , . . . , X n ), for a given sample, = xj, 



x 2 = X 2 , 



Xn x n* 



X n>- 



An interval estimator, [©^(Xp X^, . . 

6^j(X,, X 2 , . . . , X n )], of the parameter, 0, is a 
pair of recipes, §L,(X^, X 2 , •* *» x n ) an< ^ 



X- 



V x h 

values, Xj, X 2 , 



X n ), for combining the observed 
. . . , X n , in the sample to estimate 



the lower and upper ends of ah interval, ® - ^{j> 

that contains 0. 

(iv) A confidence interval is the combination of an 
interval estimator and the amount of confidence 
placed in it, as measured by the probability, 

1 - a, of its b^ing correct: 

(a) p[$l(Xj, X 2 , . . . , X n ) ^ 0 ^ iy(Xp X 2 , ...» X. 
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(b) p[e s S L (X 1( x 2 x n >] = i-o, 

where 8 tj(Xj, X 2 .... X n )is ®; and 

(c) p[e S ^(Xj, x 2 X n )] = l-o, 

where Bj^Xj, X 2 X n ) is - oo. 

c. Testing is a statistical inference that tests one 

hypothesis concerning the population model against 
a second hypothesis concerning it, usually in terms 
of its parameter(s). 

(i) A statistical hypothesis, H, is an hypothesis 
concerning the parameter (s) of the population 
model. 

(ii) The null hypothesis, Hq, is the statistical 
hypothesis which is to be tested against the 
alternative hypothesis, H^: 
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(iii) A test of the null hypothesis, Hq, against the 

alternative hypothesis, H^, is the specification 
of all samples that are sufficiently critical of 
Hq, or have a sufficiently low probability of 
occurring if Hq were true, for Hq to be rejected 
in favor of H^. 



(iv) The null hypothesis, Hq, will then be rejected 

when the sample, X^, X^, . . ., X n , is sufficiently 
critical of it; and will be accepted when the sample, 
X., X£, . . ., X n , is not sufficiently critical of it. 
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(v) 



The two types of errors that can be made are an 



error of rejecting the null hypothesis, Hq, when 
it is true, and an error of accepting Hq when it 
is false. 

Regression analysis treats statistical inference regarding 
the representation of a composite variable as a combina- 
tion of component variables plus an error; and it is basic 
to many statistical applications, especially those concerned 
with the modeling, prediction, and control and optimization 
stages in the evolution of a discipline from an art 
into a science. 



(i) The necessary notation is now introduced. 

(a) t represents time or some other auxiliary 
variable. 

(b) X^ (t), X2 (t), . . . , Xp (t) denote p component 
(factor, input or independent) variables at 
time, t. 

(c) ( 3 p p£> • • . » P-r, are unspecified constants 

(coefficients). 

(d) Y (t) denotes the composite (response, output 
or dependent) variable at time, t. 

(e) £ (t) represents the error contained in the 
representation of Y (t). 

(f) (X u , X 2i , X pi , Y.) = [Xj (t.), X 2 (t.), 

.. ., Xp (t.), Y (t.)] symbolizes corresponding 
observations of the p component variables and 
the composite variable at time, t- . 

(g) f (Xj (t), X 2 (t) x p (t); P r P 2 , P ) 

denotes the combination of (t), X2 (t), 

X p (*)• 






(h) = Y. - f(X|-, ^pi’ ^1* ^2’ 

. . . , pp) is an unknown, but implicit, 

observation of e(t) at time, t^. 

(ii) There are three equivalent statements of the 
regression problem. 

(a) Physical statement : Based upon (X^., X 2 p 

. X pi , Yj) for i= 1, 2, n, obtain the 

optimum representation of Y (t) in the form, 

Y (t) = f(X L (t), X z (t), ..., X p (t); 

Pj. p 2 pp) + e <‘)- 

(b) Statistical statement : Based upon (X^, X^, 

. . . , X pi , Y.) for i = 1, 2, . . . , n, obtain the 
optimum estimators of (3^, P 2 » • • • * p p i n 

representation of Y (t) in the form, 

Y (t) = f (X x (t), X 2 (t), X p (t); 

(\, p 2 , • • • , P p ) + «(*)• 

(c) Graphical statement: Based upon (t., Y.) 

for i = 1, 2, . . . , n, obtain the optimum 
representation of Y (t) in the form, 

Y (t) = f(X l (t), X 2 (t), . X p (t); 

p r p 2 > • • • - p p ) + e(t) - 

(iii) Although many criteria may be utilized in selecting 
an optimum solution to the regression problem (one 
might select that solution which minimizes such 
quantities as 



n 

E 

i = 1 



A I 

I t. - t. ‘ for horizontal deviations, 

'ii 



n 



E IV f < x n> x 2i V p i* p 2’ 

i = 1 

2 

p )] for vertical deviations, 

P 



• • • 9 



Y(t) 

A 







Y.- 

i 



tj* = HORIZONTAL DEVIATION OF (tj, Y.) FROM THE CORRESPONDING POINT 
ON A REPRESENTATION OF Y(t) IN THE FORM, Y(t) = 
f(X 1 (t), X 2 (t) X p (t); p r p 2 p p ) + €(t) 

flX^, X 2j X p .; p v p 2 p p ) = VERTICAL DEVIATION OF (tj, Y.) 

FROM THE CORRESPONDING POINT ON A REPRESENTATION OF Y (t) in 
THE FORM, Y(t) = f(X 1 (t), X 2 (t) X p (t);P r (3 2 (3 p ) + €(t) 



The Regression Problem 



or \ max I t. - t/' | + (1 - \) 
max | Y. - f(X u , X^, . . . , X p . ; 

& 2 ' ** •» Pp) | for a linear combination 
of horizontal and vertical deviations), 



the le ast squares criterion selects that 
solution which minimizes 



Q 




[Y. - f(X u , 
^ 2 * • * * * 





(a) Because Q is a standard, mathematical 
measure of squared distance, the least 
squares criterion is meaningful 
mathematically. 

(b) The least squares criterion also is 
relatively simple to implement 
mathematically. 

(c) The appropriateness of the least squares 

criterion is determined actually by the 
probability distribution of £^, . • . , 

(d) Adrien Marie Legendre first published the 
least squares criterion in 1806, but Carl 
Fredrich Gauss first formulated it in an 
unpublished manuscript of 1802. 



(iv) When the combination of component variables is 

linear <f(Xj (t), X 2 (t) X p (t); Pj, P 2 P p > 



_ y p. X. (t)), the solution of the 

j' = i J 3 

regression problem is considerably easier to 
obtain mathematically. 

(v) As an example, consider Y (t) = + P 2 ^ ^ + e ^ 

(that is, for all t: (t) = 1, X^ (t) = X (t), 

and X. (t) = 0 for j = 3, 4, ... , p). 

J 
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